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Final  Report 


The  principal  thrust  of  the  research  funded  by  this  award  was  continued 
elucidation  of  characteristics  of  the  protein  folding  code.  In  work  supported  by 
another  ONR  award  (N000 14-9 1-J- 1943)  it  was  demonstrated  that  there  exists 
a  short-range  folding  code  in  proteins,  and  the  characteristics  of  that  code  were 
investigated.  In  the  present  work,  we  have  explored  the  possibility  that  there 
exists  an  inverse  folding  code  in  proteins.  This  is  the  code  which  determines 
which  sequence  fragments  can  be  accomodated  by  a  given  local  chain  fold. 

We  have  demonstrated1  that  an  inverse  folding  code  indeed  exists,  and 
that  it  has  the  same  characteristics  as  the  direct  folding  code  (which  addresses 
the  structures  permitted  by  a  given  sequence).  In  particular,  it  was  shown  that 
~73%  of  four-residue  structure  fragments  show  a  preference  for  particular  dis¬ 
tributions  of  sequences,  and  that  the  remainder  do  not.  Rather,  we  suggest 
that  these  non-coding  structure  elements  act  as  hot  spots  in  which  mutations 
are  tolerated  with  no  cost  in  short-range  folding  energy. 

This  is  the  first  study  to  consider  the  inverse  folding  problem  from  the 
viewpoint  of  short-range  interactions.  All  previous  studies  of  this  problem  have 
considered  the  manner  in  which  sequence  substitution  affects  long-range 
interactions.  Combination  of  our  results  with  those  of  other  workers,  in  a  study 
of  mutation  frequencies  in  protein  sequences,  therefore  has  the  potential  to 
show  the  relative  importance  of  short-  and  long-range  interactions  in  determin¬ 
ing  mutation  probabilities. 

We  are  also  continuing  development  work  on  methodology  in  this  area.  In 
recent  computations,  we  have  investigated  the  effect  of  amino  acid  clustering 
methods,  useful  for  sequence  data  compression,  on  coding  results. 

A  new  area  of  interest  initiated  under  funding  from  this  award  is  the  inves¬ 
tigation  of  the  connection  between  sequence  properties  and  structural  symme¬ 
try  in  proteins.  Some  protein  structures  show  a  high  degree  of  three-dimen¬ 
sional  symmetry,  and  physical  intuition  suggests  that  this  structural  property 
must  be  somehow  reflected  in  the  distribution  of  amino  acid  properties  along 
the  chain.  We  are  developing  Fourier  methods  to  search  for  statistically  signifi¬ 
cant  periodicities  of  amino  acid  properties  which  may  be  linked  to  specific 
details  of  protein  structure. 
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